AITopics | high leverage point

Collaborating Authors

high leverage point

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

New Subsampling Algorithms for Fast Least Squares Regression Yichao Lu2 Dean Foster

Neural Information Processing SystemsMar-13-2024, 16:06:34 GMT

We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data (n p). We propose three methods which solve the big data problem by subsampling the covariance matrix using either a single or two stage estimation.

algorithm, matrix, uluru, (13 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

A sub-sampling algorithm preventing outliers

Deldossi, L., Pesce, E., Tommasi, C.

arXiv.org Machine LearningAug-12-2022

Nowadays, in many different fields, massive data are available and for several reasons, it might be convenient to analyze just a subset of the data. The application of the D-optimality criterion can be helpful to optimally select a subsample of observations. However, it is well known that D-optimal support points lie on the boundary of the design space and if they go hand in hand with extreme response values, they can have a severe influence on the estimated linear model (leverage points with high influence). To overcome this problem, firstly, we propose an unsupervised exchange procedure that enables us to select a nearly D-optimal subset of observations without high leverage values. Then, we provide a supervised version of this exchange procedure, where besides high leverage points also the outliers in the responses (that are not associated to high leverage points) are avoided. This is possible because, unlike other design situations, in subsampling from big datasets the response values may be available. Finally, both the unsupervised and the supervised selection procedures are generalized to I-optimality, with the goal of getting accurate predictions.

artificial intelligence, high leverage point, machine learning, (15 more...)

arXiv.org Machine Learning

2208.06218

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Italy > Lombardy > Milan (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.48)

Add feedback

New Subsampling Algorithms for Fast Least Squares Regression

Dhillon, Paramveer, Lu, Yichao, Foster, Dean P., Ungar, Lyle

Neural Information Processing SystemsDec-31-2013

We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data ($n \gg p$). We propose three methods which solve the big data problem by subsampling the covariance matrix using either a single or two stage estimation. All three run in the order of size of input i.e. O($np$) and our best method, {\it Uluru}, gives an error bound of $O(\sqrt{p/n})$ which is independent of the amount of subsampling as long as it is above a threshold. We provide theoretical bounds for our algorithms in the fixed design (with Randomized Hadamard preconditioning) as well as sub-Gaussian random design setting. We also compare the performance of our methods on synthetic and real-world datasets and show that if observations are i.i.d., sub-Gaussian then one can directly subsample without the expensive Randomized Hadamard preconditioning without loss of accuracy.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback